Parallel and Distributed Computing, Applications and Technologies by Unknown

Parallel and Distributed Computing, Applications and Technologies by Unknown

Author:Unknown
Language: eng
Format: epub
ISBN: 9789811359071
Publisher: Springer Singapore


The additional hardware to support our proposed LDU has limited entries. These are used for storing partial tags of L1 data cache. Figure 6 shows the performance impact of using different the number of LDU entries from 4 to 32. In our experiment, the optimal numbers for each benchmark is different each other. SC benchmark shows improvement up to 5.8% with 32 entries. On average, 16 entries for LDU is adequate design choice.

Fig. 6.IPC comparison according to LDU entries

5 Conclusion

In this paper, we exploited the locality type of GPU workload based on access behavior of warps. According to our analysis, LRR policy utilizes inter-warp locality and improves performance more than GTO policy. On the other hand, the GTO can enhance the intra-warp locality. To apply warp scheduler selectively between GTO and LRR, cache access information for L1 data cache is stored in our proposed GPU architecture. The proposed scheduler improves the L1 data cache efficiency of the GPU by dynamically selecting better scheduling policy between GTO and LRR. According to our simulation results, the proposed scheduler improves the overall performance by 19% over LRR and by 3.8% over GTO, respectively.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.